legendre polynomial
Improving KAN with CDF normalization to quantiles
--Data normalization is crucial in machine learning, usually performed by subtracting the mean and dividing by standard deviation, or by rescaling to a fixed range. In copula theory [1], popular in finance, there is used normalization to approximately quantiles by transforming x CDF (x) with estimated CDF/EDF (cumulative/empirical distribution function) to nearly uniform distribution in [0, 1], allowing for simpler representations which are less likely to overfit. It seems nearly unknown in machine learning, therefore, as proposed in [2], we would like to present some its advantages on example of recently popular Kolmogorov-Arnold Networks (KANs), improving predictions from Legendre-KAN [3] by just switching rescaling to CDF normalization. Additionally, in HCR interpretation, weights of such neurons are mixed moments providing local joint distribution models, allow to propagate also probability distributions, and change propagation direction. Data normalization is very useful for various types of analysis, for example, through batch normalization in neural networks [4].
UnHiPPO: Uncertainty-aware Initialization for State Space Models
Lienen, Marten, Saydemir, Abdullah, Günnemann, Stephan
State space models are emerging as a dominant model class for sequence problems with many relying on the HiPPO framework to initialize their dynamics. However, HiPPO fundamentally assumes data to be noise-free; an assumption often violated in practice. We extend the HiPPO theory with measurement noise and derive an uncertainty-aware initialization for state space model dynamics. In our analysis, we interpret HiPPO as a linear stochastic control problem where the data enters as a noise-free control signal. We then reformulate the problem so that the data become noisy outputs of a latent system and arrive at an alternative dynamics initialization that infers the posterior of this latent system from the data without increasing runtime. Our experiments show that our initialization improves the resistance of state-space models to noise both at training and inference time. Find our implementation at https://cs.cit.tum.de/daml/unhippo.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada (0.04)
- (2 more...)
Data-driven Approach for Interpolation of Sparse Data
Ferguson, R. F., Ireland, D. G., McKinnon, B.
Extracting information about hadron resonances requires fitting theoretical models to experimental data. However, this data often comes from different experiments of different physics quantities in varying kinematic regions; studying coupled channels with different kinematic coverages and binning can make direct comparison challenging. The consistency of these datasets directly impacts the quality of the fit, thus making it difficult to accurately constrain the theoretical models. Sparse datasets in key kinematic regions further complicates the quantification of uncertainties, often requiring arbitrary weighting that may introduce bias. A robust approach to solving these problems involves utilising Gaussian Processes (GPs), a Bayesian inference machine learning technique that provides probabilistic predictions for unknown datapoints. Unlike traditional machine learning methods, GPs do not require any training; instead, they operate on three fundamental assumptions: 1. Some kernel function can be defined to measure the covariance between known datapoints; 2. This same kernel function can be used to predict the covariance between unknown datapoints; 3. Some idea of the form of the posterior distribution is known (e.g.
- Europe > United Kingdom (0.14)
- North America > United States > Virginia (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
Reconstruction of frequency-localized functions from pointwise samples via least squares and deep learning
Neuman, A. Martina, Pineda, Andres Felipe Lerma, Bramburger, Jason J., Brugiapaglia, Simone
Recovering frequency-localized functions from pointwise data is a fundamental task in signal processing. We examine this problem from an approximation-theoretic perspective, focusing on least squares and deep learning-based methods. First, we establish a novel recovery theorem for least squares approximations using the Slepian basis from uniform random samples in low dimensions, explicitly tracking the dependence of the bandwidth on the sampling complexity. Building on these results, we then present a recovery guarantee for approximating bandlimited functions via deep learning from pointwise data. This result, framed as a practical existence theorem, provides conditions on the network architecture, training procedure, and data acquisition sufficient for accurate approximation. To complement our theoretical findings, we perform numerical comparisons between least squares and deep learning for approximating one- and two-dimensional functions. We conclude with a discussion of the theoretical limitations and the practical gaps between theory and implementation.
- Europe > Austria > Vienna (0.14)
- North America > United States > New York (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (2 more...)
PoPE: Legendre Orthogonal Polynomials Based Position Encoding for Large Language Models
There are several improvements proposed over the baseline Absolute Positional Encoding (APE) method used in original transformer. In this study, we aim to investigate the implications of inadequately representing positional encoding in higher dimensions on crucial aspects of the attention mechanism, the model's capacity to learn relative positional information, and the convergence of models, all stemming from the choice of sinusoidal basis functions. Through a combination of theoretical insights and empirical analyses, we elucidate how these challenges extend beyond APEs and may adversely affect the performance of Relative Positional Encoding (RPE) methods, such as Rotatory Positional Encoding (RoPE). Subsequently, we introduce an innovative solution termed Orthogonal Polynomial Based Positional Encoding (PoPE) to address some of the limitations associated with existing methods. The PoPE method encodes positional information by leveraging Orthogonal Legendre polynomials. Legendre polynomials as basis functions offers several desirable properties for positional encoding, including improved correlation structure, non-periodicity, orthogonality, and distinct functional forms among polynomials of varying orders. Our experimental findings demonstrate that transformer models incorporating PoPE outperform baseline transformer models on the $Multi30k$ English-to-German translation task, thus establishing a new performance benchmark. Furthermore, PoPE-based transformers exhibit significantly accelerated convergence rates. Additionally, we will present novel theoretical perspectives on position encoding based on the superior performance of PoPE.
- Africa (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Dominican Republic (0.04)
- (2 more...)
- Research Report > New Finding (0.54)
- Research Report > Promising Solution (0.34)
Sequential transport maps using SoS density estimation and $\alpha$-divergences
Zanger, Benjamin, Cui, Tiangang, Schreiber, Martin, Zahm, Olivier
Transport-based density estimation methods are receiving growing interest because of their ability to efficiently generate samples from the approximated density. We further invertigate the sequential transport maps framework proposed from arXiv:2106.04170 arXiv:2303.02554, which builds on a sequence of composed Knothe-Rosenblatt (KR) maps. Each of those maps are built by first estimating an intermediate density of moderate complexity, and then by computing the exact KR map from a reference density to the precomputed approximate density. In our work, we explore the use of Sum-of-Squares (SoS) densities and $\alpha$-divergences for approximating the intermediate densities. Combining SoS densities with $\alpha$-divergence interestingly yields convex optimization problems which can be efficiently solved using semidefinite programming. The main advantage of $\alpha$-divergences is to enable working with unnormalized densities, which provides benefits both numerically and theoretically. In particular, we provide two new convergence analyses of the sequential transport maps: one based on a triangle-like inequality and the second on information geometric properties of $\alpha$-divergences for unnormalizied densities. The choice of intermediate densities is also crucial for the efficiency of the method. While tempered (or annealed) densities are the state-of-the-art, we introduce diffusion-based intermediate densities which permits to approximate densities known from samples only. Such intermediate densities are well-established in machine learning for generative modeling. Finally we propose and try different low-dimensional maps (or lazy maps) for dealing with high-dimensional problems and numerically demonstrate our methods on several benchmarks, including Bayesian inference problems and unsupervised learning task.
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)
- North America > United States > Indiana (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (6 more...)
No-Regret Reinforcement Learning in Smooth MDPs
Maran, Davide, Metelli, Alberto Maria, Papini, Matteo, Restell, Marcello
Obtaining no-regret guarantees for reinforcement learning (RL) in the case of problems with continuous state and/or action spaces is still one of the major open challenges in the field. Recently, a variety of solutions have been proposed, but besides very specific settings, the general problem remains unsolved. In this paper, we introduce a novel structural assumption on the Markov decision processes (MDPs), namely $\nu-$smoothness, that generalizes most of the settings proposed so far (e.g., linear MDPs and Lipschitz MDPs). To face this challenging scenario, we propose two algorithms for regret minimization in $\nu-$smooth MDPs. Both algorithms build upon the idea of constructing an MDP representation through an orthogonal feature map based on Legendre polynomials. The first algorithm, \textsc{Legendre-Eleanor}, archives the no-regret property under weaker assumptions but is computationally inefficient, whereas the second one, \textsc{Legendre-LSVI}, runs in polynomial time, although for a smaller class of problems. After analyzing their regret properties, we compare our results with state-of-the-art ones from RL theory, showing that our algorithms achieve the best guarantees.
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Poland (0.04)
Efficiently Solving High-Order and Nonlinear ODEs with Rational Fraction Polynomial: the Ratio Net
Qin, Chenxin, Liu, Ruhao, Li, Maocai, Li, Shengyuan, Liu, Yi, Zhou, Chichun
Recent advances in solving ordinary differential equations (ODEs) with neural networks have been remarkable. Neural networks excel at serving as trial functions and approximating solutions within functional spaces, aided by gradient backpropagation algorithms. However, challenges remain in solving complex ODEs, including high-order and nonlinear cases, emphasizing the need for improved efficiency and effectiveness. Traditional methods have typically relied on established knowledge integration to improve problem-solving efficiency. In contrast, this study takes a different approach by introducing a new neural network architecture for constructing trial functions, known as ratio net. This architecture draws inspiration from rational fraction polynomial approximation functions, specifically the Pade approximant. Through empirical trials, it demonstrated that the proposed method exhibits higher efficiency compared to existing approaches, including polynomial-based and multilayer perceptron (MLP) neural network-based methods. The ratio net holds promise for advancing the efficiency and effectiveness of solving differential equations.
- North America > United States > New York (0.04)
- Asia > China > Yunnan Province (0.04)
- Asia > China > Jiangxi Province > Nanchang (0.04)